MS 7.118 - pg 364 Extinct New Zealand birds. Refer to
the Evolutionary Ecology Research (July 2003) study of the New Zealand
bird population prior to European contact, Exercise 1.12 (p. 6). Two
quantitative variables measured for each of the 116 bird species were
body mass (grams) and egg length (millimeters). Descriptive statistics
for these variables are shown on the MINITAB printout below.
knitr::include_graphics("ass4Q1-1.png")
massSample = rnorm(35, 9113, 31457)
eggSample = rnorm(35, 61.06, 45.46)
mI = t.test(massSample)
eI = t.test(eggSample)
# Mass 95% confidence Interval
mI$conf.int
## [1] -2197.726 18654.434
## attr(,"conf.level")
## [1] 0.95
# Egg 95% confidence Interval
eI$conf.int
## [1] 41.49280 70.08982
## attr(,"conf.level")
## [1] 0.95
95% of the data is located inbetween these two ranges (both for mass and egg).
The interval contains m because the true mean is trying to reach the center of which the interval should provide (both for mass and egg).
Already Done.
knitr::include_graphics("ass4Q1-2.png")
c(21/38, 7/78)
## [1] 0.55263158 0.08974359
No, interval doesn’t contain 1.
MS 7.120 - pg 365 Strength of epoxy-repaired joints. The
methodology for conducting a stress analysis of newly designed timber
structures is well known. However, few data are available on the actual
or allowable stress for repairing damaged structures. Consequently,
design engineers often propose a repair scheme (e.g., gluing) without
any knowledge of its structural effectiveness. To partially fill this
void, a stress analysis was conducted on epoxy-repaired truss joints
(Journal of Structural Engineering, Feb. 1986). Tests were conducted on
epoxy-bonded truss joints made of various species of wood to determine
actual glue-line shear stress recorded in pounds per square inch (psi).
Summary information for independent random samples of southern pine and
ponderosa pine truss joints is given in the accompanying table.
knitr::include_graphics("ass4Q2.png")
Estimate the difference between the mean shear strengths of epoxy-repaired truss joints for the two species of wood with a 90% confidence interval. \[(\bar{y_1}-\bar{y_2}) \pm z_{\alpha/2} \sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}=> (1,312-1352) \pm 1.645\sqrt{\frac{422^2}{100}+\frac{271^2}{47}}\] \[=> -40 \pm 95.118 => (-135.118, 55.118)\]
Construct a 90% confidence interval for the ratio of the shear stress variances of epoxy-repaired truss joints for the two species of wood. Based on this interval, is there evidence to indicate that the two shear stress variances differ? Explain. \[\frac{422^2}{271^2}*\frac{1}{1.54818} \leq \frac{\sigma_1^2}{\sigma_2^2} \leq \frac{422^2}{271^2}(1.49194)=>1.566 \leq \frac{\sigma_1^2}{\sigma_2^2} \leq 3.618\] Yes because the interval doesn’t contain 1, meaning that their is not chance that they share the same interval and therefore share variances.
myboot<-function(iter=10000,x,fun="mean",alpha=0.05,...){ #Notice where the ... is repeated in the code
n=length(x) #sample size
y=sample(x,n*iter,replace=TRUE)
rs.mat=matrix(y,nr=n,nc=iter,byrow=TRUE)
xstat=apply(rs.mat,2,fun) # xstat is a vector and will have iter values in it
ci=quantile(xstat,c(alpha/2,1-alpha/2))# Nice way to form a confidence interval
# A histogram follows
# The object para will contain the parameters used to make the histogram
para=hist(xstat,freq=FALSE,las=1,
main=paste("Histogram of Bootstrap sample statistics","\n","alpha=",alpha," iter=",iter,sep=""),
...)
#mat will be a matrix that contains the data, this is done so that I can use apply()
mat=matrix(x,nr=length(x),nc=1,byrow=TRUE)
#pte is the point estimate
#This uses whatever fun is
pte=apply(mat,2,fun)
abline(v=pte,lwd=3,col="Black")# Vertical line
segments(ci[1],0,ci[2],0,lwd=4) #Make the segment for the ci
text(ci[1],0,paste("(",round(ci[1],2),sep=""),col="Red",cex=3)
text(ci[2],0,paste(round(ci[2],2),")",sep=""),col="Red",cex=3)
# plot the point estimate 1/2 way up the density
text(pte,max(para$density)/2,round(pte,2),cex=3)
return(list(ci=ci,fun=fun,x=x))# Some output to use if necessary
}
MS 7.128 - pg 367 Suppose y is a random sample of size n
= 1 from a normal distribution with mean 0 and unknown variance \(\sigma^2\).
Show that \(y^2\)/\(\sigma^2\) has a chi-square distribution with 1 degree of freedom. (Hint: The result follows directly from Theorem 6.11.) \[P(\chi^2_{1-\alpha/2} \leq \frac{Y^2}{\sigma^2} \leq \chi_{\alpha/2}^2) = 1-\alpha\] \[P(\frac{1}{\chi^2_{1-\alpha/2}} \leq \frac{Y^2}{\sigma^2} \leq \frac{1}{\chi_{\alpha/2}^2)} = P(\frac{1}{\chi^2_{\alpha/2}} \leq \frac{Y^2}{\sigma^2} \leq \frac{1}{\chi_{1-\alpha/2}^2)}\] \[P(\frac{Y^2}{\chi^2_{1-\alpha/2}} \leq \sigma^2 \leq \frac{Y^2}{\chi_{\alpha/2}^2)}\]
Derive a 95% confidence interval for \(\sigma^2\) using \(y^2\)/\(\sigma^2\) as a pivotal statistic. \[P(\frac{Y^2}{\chi^2_{0.95/2}} \leq \sigma^2 \leq \frac{Y^2}{\chi_{0.05/2}^2)}\]
MS 8.24 - pg 390 Surface roughness of pipe. Refer to the
Anti-corrosion Methods and Materials (Vol. 50, 2003) study of the
surface roughness of coated interior pipe used in oil fields, Exercise
7.26 (p. 311). The data (in micrometers) for 20 sampled pipe sections
are reproduced in the table on p. 391.
Give the null and alternative hypotheses for testing whether the mean surface roughness of coated interior pipe, \(\mu\), differs from 2 micrometers. \(H_0: \mu = 2\) \(H_a: \mu \neq 2\)
The results of the test, part a, are shown in the MINITAB printout at the bottom of the page. Locate the test statistic and p-value on the printout.
x = c(1.72, 2.5, 2.16, 2.13, 1.06, 2.24, 2.31, 2.03, 1.09, 1.40, 2.57, 2.64, 1.26, 2.05, 1.19, 2.13, 1.27, 1.51, 2.41, 1.95)
t.test(x, mu=2)
##
## One Sample t-test
##
## data: x
## t = -1.0158, df = 19, p-value = 0.3225
## alternative hypothesis: true mean is not equal to 2
## 95 percent confidence interval:
## 1.635802 2.126198
## sample estimates:
## mean of x
## 1.881
t = -1.0158, p-value = 0.3225
knitr::include_graphics("ass4Q4.png")
Give the rejection region for the hypothesis test, using \(\alpha\) = .05. Reject the null hypothesis if the absolute value of the test statistic exceeds 2.0930
State the appropriate conclusion for the hypothesis test. Since the absolute value of the test statistic value is less than the critical value, we don’t reject the null hypothesis.
In Exercise 7.26 you found a 95% confidence interval for \(\mu\). Explain why the confidence interval and test statistic lead to the same conclusion about \(\mu\). The 95% confidence interval is (1.635802, 2.126198) and since the null hypothesis value of 2 is in this confidence interval, we don’t reject the null hypothesis
MS 8.28 - pg 392 Dissolved organic compound in lakes.
The level of dissolved oxygen in the surface water of a lake is vital to
maintaining the lake’s ecosystem. Environmentalists from the University
of Wisconsin monitored the dissolved oxygen levels over time for a
sample of 25 lakes in the state (Aquatic Biology, May 2010). To ensure a
representative sample, the environmentalists focused on several lake
characteristics, including dissolved organic compound (DOC). The DOC
data (measured in grams per cubic-meters) for the 25 lakes are listed in
the accompanying table. The population of Wisconsin lakes has a mean DOC
value of 15 grams/\(m^3\).
x <- c(9.6,4.5,13.2,4.1,22.6,2.7,14.7,3.5,13.6,19.8,14.3,19.8,14.3,56.9,25.1,18.4,2.7,4.2,30.2,10.3,17.6,2.4,17.3,38.8,3.0,5.8,7.6)
t.test(x, mu=15, conf.level = 0.9)
##
## One Sample t-test
##
## data: x
## t = -0.1232, df = 26, p-value = 0.9029
## alternative hypothesis: true mean is not equal to 15
## 90 percent confidence interval:
## 10.60169 18.80572
## sample estimates:
## mean of x
## 14.7037
With this p-value, the result is not significant. So we can say that the sample is a representative of all Wisconsin lakes for the DOC.
knitr::include_graphics("ass4Q5.png")
MS 8.44 - pg 401 Insecticides used in orchards.
Environmental Science & Technology (Oct. 1993) reported on a study
of insecticides used on dormant orchards in the San Joaquin Valley,
California. Ambient air samples were collected and analyzed daily at an
orchard site during the most intensive period of spraying. The thion and
oxon levels (in ng/\(m^2\)) in the air
samples are recorded in the table, as well as the oxon/thion ratios.
Compare the mean oxon/thion ratios of foggy and clear/cloudy conditions
at the orchard using a test of hypothesis. Use \(\alpha\) = .05.
knitr::include_graphics("ass4Q6.png")
# s <- sum(c(10.3,6.9,6.2,12.4,45.8,9.9,27.4,44.8,27.8,6.5,11.2,16.6))
# oxon <- c(10.3,6.9,6.2,12.4,45.8,9.9,27.4,44.8,27.8,6.5,11.2,16.6, s)
# thion <- c(38.2,28.6,30.2,23.7,62.3,74.1,88.2,46.4,135.9,102.9,28.9,46.9,44.3)
# t.test(oxon, thion, paired = TRUE)
m <- c(0.27,0.241,0.205,0.523,0.618,0.112,0.591,0.330,0.270,0.225,0.239,0.375)
t.test(m)
##
## One Sample t-test
##
## data: m
## t = 7.1371, df = 11, p-value = 1.9e-05
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 0.2304799 0.4360201
## sample estimates:
## mean of x
## 0.33325
There is a mean significance difference between foggy and clear conditions of oxon/thion ratio (p<0.05)
MS 8.84 - pg 425 – This refers to 8.39 NOT 8.33! Cooling
method for gas turbines. Refer to the Journal of Engineering for Gas
Turbines and Power (Jan. 2005) study of gas turbines augmented with
high-pressure inlet fogging, Exercise 8.39 (p. 399). Heat rate data
(kilojoules per kilowatt per hour) for each of three types of gas
turbines (advanced, aeroderivative, traditional) are saved in the
GASTURBINE file. In order to compare the mean heat rates of two types of
gas turbines, you assumed that the heat rate variances were equal.
# Read in data
dird="~/Desktop/MainFolder/OuClasses/Spring 2023/Applied Statistical Methods/FALL224753wise0046/CourseData/Data-for-the-course/K25936_Downloads/Excel/"
library(readxl)
files = list.files(dird)
### Important Functions
myconvert = function(xl) {
if(stringr::str_ends(xl, "XLS") | stringr::str_ends(xl, "xls")){
v=try(readxl::read_xls(paste0(dird, xl)), silent = TRUE)
}
else{
v = NA
}
v
}
g <- myconvert("GASTURBINE.XLS")
#ddt[ddt$RIVER == "TRM" & ddt$LENGTH == 52,]
traditional <- g[g$ENGINE == "Traditional",]$HEATRATE
aeroderiv <- g[g$ENGINE == "Aeroderiv",]$HEATRATE
advanced <- g[g$ENGINE == "Advanced",]$HEATRATE
t.test(traditional, aeroderiv)
##
## Welch Two Sample t-test
##
## data: traditional and aeroderiv
## t = -0.75036, df = 6.5099, p-value = 0.4793
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3224.118 1688.843
## sample estimates:
## mean of x mean of y
## 11544.08 12311.71
t.test(advanced, aeroderiv)
##
## Welch Two Sample t-test
##
## data: advanced and aeroderiv
## t = -2.5174, df = 6.2334, p-value = 0.044
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -5001.26059 -93.59655
## sample estimates:
## mean of x mean of y
## 9764.286 12311.714
There is 1 in the interval, meaning that the variances could be the same. (P < 0.05) so we reject the null hypothesis.
There is not 1 in the interval, meaning that the variance are not the same. (P > 0.05) so we don’t reject the null hypothesis.
MS 8.99 - pg 438
knitr::include_graphics("ass4Q8.png")
Mongolian desert ants (continued). Refer to the Journal of Biogeography (Dec. 2003) study of ants in Mongolia (Central Asia), Exercise 8.98, where you compared the mean number of ants at two desert sites. Since the sample sizes were small, the variances of the populations at the two sites must be equal in order for the inference to be valid.
\(H_0: \frac{\sigma_1^2}{\sigma_2^2} = 1\) \(H_a: \frac{\sigma_1^2}{\sigma_2^2} \neq 1\)
ants <- myconvert("GOBIANTS.XLS")
dry <- ants[ants$Region == "Dry Steppe",]$AntSpecies
gobi <- ants[ants$Region == "Gobi Desert",]$AntSpecies
t.test(dry, gobi)
##
## Welch Two Sample t-test
##
## data: dry and gobi
## t = 0.17926, df = 7.9859, p-value = 0.8622
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -25.71446 30.04779
## sample estimates:
## mean of x mean of y
## 14.00000 11.83333
Rejection region is f > 7.9859 or f < 0.17926
p-value is 0.8622
(P > 0.05) so we can’t reject the null hypothesis.
We need independent random samples from a normal population of ants.
MS 8.104 - pg 439 Real-time scheduling with robots.
Researchers at Purdue University compared human real-time scheduling in
a processing environment to an automated approach that utilizes
computerized robots and sensing devices (IEEE Transactions, Mar. 1993).
The experiment consisted of eight simulated scheduling problems. Each
task was performed by a human scheduler and by the automated system.
Performance was measured by the throughput rate, defined as the number
of good jobs produced weighted by product quality. The resulting
throughput rates are shown in the accompanying table. Analyze the data
using a test of hypothesis.
knitr::include_graphics("ass4Q9.png")
human <- c(185.4,146.3,174.4,184.9,240.0,253.8,238.8,263.5)
auto <- c(180.4,248.5,185.5,216.4,269.3,249.6,282.0,315.9)
t.test(human, auto)
##
## Welch Two Sample t-test
##
## data: human and auto
## t = -1.441, df = 13.897, p-value = 0.1717
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -81.06293 15.93793
## sample estimates:
## mean of x mean of y
## 210.8875 243.4500
There is 1 in the interval, meaning that the variance could be the same. (P > 0.05) so we don’t reject the null hypothesis.
myboot<-function(iter=10000,x,fun="mean",alpha=0.05,...){ #Notice where the ... is repeated in the code
n=length(x) #sample size
y=sample(x,n*iter,replace=TRUE)
rs.mat=matrix(y,nr=n,nc=iter,byrow=TRUE)
xstat=apply(rs.mat,2,fun) # xstat is a vector and will have iter values in it
ci=quantile(xstat,c(alpha/2,1-alpha/2))# Nice way to form a confidence interval
t=qnorm(1-alpha/2,mean=0,sd=1)
m=mean(x)
cit=c(m-t/2,m+t/2)
# A histogram follows
# The object para will contain the parameters used to make the histogram
para=hist(xstat,freq=FALSE,las=1, col = "cyan",
main=paste("Histogram of Bootstrap sample statistics","\n","alpha=",alpha," iter=",iter,sep=""),
...)
#mat will be a matrix that contains the data, this is done so that I can use apply()
mat=matrix(x,nr=length(x),nc=1,byrow=TRUE)
#pte is the point estimate
#This uses whatever fun is
pte=apply(mat,2,fun)
abline(v=pte,lwd=3,col="Black")# Vertical line
segments(ci[1],0,ci[2],0,lwd=4) #Make the segment for the ci
text(ci[1],0,paste("(",round(ci[1],2),sep=""),col="Red",cex=2)
text(ci[2],0,paste(round(ci[2],2),")",sep=""),col="Red",cex=2)
text(ci[1],0.1,paste("(",round(cit[1],2),sep=""),col="Blue",cex=2)
text(ci[2],0.1,paste(round(cit[2],2),")",sep=""),col="Blue",cex=2)
# plot the point estimate 1/2 way up the density
text(pte,max(para$density)/2,round(pte,2),cex=3)
return(list(ci=ci,fun=fun,x=x,t=t,cit=cit))# Some output to use if necessary
}
set.seed(35); sam<-round(rnorm(30,mean=20,sd=3),3)
myboot(x=sam)
## $ci
## 2.5% 97.5%
## 20.07910 22.22545
##
## $fun
## [1] "mean"
##
## $x
## [1] 23.195 20.399 19.898 19.865 30.014 18.821 21.232 18.313 23.574 21.047
## [11] 21.535 21.336 17.695 18.497 14.274 14.664 22.593 18.963 25.515 25.019
## [21] 22.053 22.871 23.006 23.829 19.038 21.735 21.461 21.659 21.703 21.049
##
## $t
## [1] 1.959964
##
## $cit
## [1] 20.18178 22.14175